智能论文笔记

Nested bandits

Matthieu Martin , Panayotis Mertikopoulos , Thibaud Rahier , Houssam Zenati

分类：机器学习

2022-06-19

在许多在线决策过程中，要求优化代理在具有许多固有相似之处的大量替代方案之间进行选择。反过来，这些相似性意味着可能会混淆标准离散选择模型和匪徒算法的损失。我们在嵌套土匪的背景下研究了这个问题，这是一类对抗性的多臂匪徒问题，学习者试图在存在大量不同的替代方案的情况下最小化他们的遗憾，并具有嵌入式（非组合）相似性的层次结构。在这种情况下，基于指数级的蓝图（例如树篱，EXP3及其变体）的最佳算法可能会产生巨大的遗憾，因为它们倾向于花费过多的时间来探索与相似，次优成本的无关紧要的替代方案。为此，我们提出了一种嵌套的指数权重（新）算法，该算法根据嵌套的，分步选择方法对学习者的替代方案进行分层探索。这样一来，我们就获得了一系列紧密的界限，以表明学习者可以有效地解决与替代方案之间高度相似性的在线学习问题，而不会发生红色的巴士 /蓝色巴士悖论。

translated by 谷歌翻译

A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling

Eustache Diemert , Artem Betlei , Christophe Renaudin , Massih-Reza Amini , Théophane Gregoir , Thibaud Rahier

分类： (统计)机器学习 | 人工智能 | 机器学习

2021-11-19

个体治疗效果（ITE）预测是机器学习的重要研究领域，其目的在解释和估算粒状水平时的作用的因果影响。它代表了对诸如医疗保健，在线广告或社会经济学的多个申请兴趣的问题。为了促进本主题的研究，我们释放了从几个随机控制试验中收集的1390万个样本的公开收集，通过健康的210倍因素扩展先前可用的数据集。我们提供有关数据收集的详细信息，并执行Sanity检查以验证使用此数据是否有因果推理任务。首先，我们正规化可以使用此数据执行的隆起建模（UM）的任务以及相关的评估指标。然后，我们提出了为ITE预测提供了一般设置的合成响应表面和异质处理分配。最后，我们报告实验以验证利用其大小的数据集的关键特性，以评估和比较 - 具有高统计显着性 - 基线UM和ITE预测方法的选择。

translated by 谷歌翻译

Acela: Predictable Datacenter-level Maintenance Job Scheduling

Yi Ding , Aijia Gao , Thibaud Ryden , Kaushik Mitra , Sukumar Kalmanje , Yanai Golany , Michael Carbin , Henry Hoffmann

分类：机器学习

2022-12-10

Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the structure of the maintenance job scheduling problem creates a unique challenge. In particular, we show that prior machine learning methods that produce the lowest error predictions do not produce the best scheduling outcomes due to asymmetric costs. Specifically, underpredicting maintenance job duration has results in more servers being taken offline and longer server downtime than overpredicting maintenance job duration. The system cost of underprediction is much larger than that of overprediction. We present Acela, a machine learning system for predicting maintenance job duration, which uses quantile regression to bias duration predictions toward overprediction. We integrate Acela into a maintenance job scheduler and evaluate it on datasets from large-scale, production datacenters. Compared to machine learning based predictors from prior work, Acela reduces the number of servers that are taken offline by 1.87-4.28X, and reduces the server offline time by 1.40-2.80X.

translated by 谷歌翻译

Industry-Scale Orchestrated Federated Learning for Drug Discovery

Martijn Oldenhof , Gergely Ács , Balázs Pejó , Ansgar Schuffenhauer , Nicholas Holway , Noé Sturm , Arne Dieckmann , Oliver Fortmeier , Eric Boniface , Clément Mayer

分类：机器学习 | (统计)机器学习

2022-10-17

To apply federated learning to drug discovery we developed a novel platform in the context of European Innovative Medicines Initiative (IMI) project MELLODDY (grant n{\deg}831472), which was comprised of 10 pharmaceutical companies, academic research labs, large industrial companies and startups. The MELLODDY platform was the first industry-scale platform to enable the creation of a global federated model for drug discovery without sharing the confidential data sets of the individual partners. The federated model was trained on the platform by aggregating the gradients of all contributing partners in a cryptographic, secure way following each training iteration. The platform was deployed on an Amazon Web Services (AWS) multi-account architecture running Kubernetes clusters in private subnets. Organisationally, the roles of the different partners were codified as different rights and permissions on the platform and administrated in a decentralized way. The MELLODDY platform generated new scientific discoveries which are described in a companion paper.

translated by 谷歌翻译

RandomSCM: interpretable ensembles of sparse classifiers tailored for omics data

Thibaud Godon , Pier-Luc Plante , Baptiste Bauvin , Elina Francovic-Fontaine , Alexandre Drouin , Jacques Corbeil

分类：机器学习

2022-08-11

背景：了解OMICS与表型之间的关系是精确医学中的一个核心问题。代谢组学数据的高维度挑战学习算法在可伸缩性和概括方面。大多数学习算法都不产生可解释的模型 - 方法：我们根据决策规则的结合或分离提出了一种集合学习算法。 - 结果：代谢组学数据的应用显示，它会产生可实现高预测性能的模型。模型的解释性使它们可用于生物标志物发现和高维数据中的模式发现。

translated by 谷歌翻译

VertXNet: Automatic Segmentation and Identification of Lumbar and Cervical Vertebrae from Spinal X-ray Images

Yao Chen , Yuanhan Mo , Aimee Readie , Gregory Ligozio , Thibaud Coroller , Bartlomiej W. Papiez

分类：计算机视觉

2022-07-12

脊柱X射线成像上椎骨的手动注释是昂贵的，并且由于骨骼形状的复杂性和图像质量变化而耗时。在这项研究中，我们通过提出一种称为Vertxnet的集合方法来解决这一挑战，以自动在X射线脊柱图像中分段和标记椎骨。 Vertxnet结合了两个最先进的分割模型，即U-NET和Mask R-CNN，以改善椎骨分割。 Vertxnet的一个主要特征也是由于其在给定的脊柱X射线图像上的掩模R-CNN组件（经过训练，可检测到“参考”椎骨）。在侧面宫颈和腰椎X射线成像的内部数据集上评估了Vertxnet，用于强直性脊柱炎（AS）。我们的结果表明，Vertxnet可以准确标记脊柱X射线（平均骰子为0.9）。它可以用来规避缺乏注释的椎骨而无需进行人类专家审查的情况。此步骤对于通过解决分割的缺乏来研究临床关联至关重要，这是大多数计算成像项目的常见瓶颈。

translated by 谷歌翻译

NeRF, meet differential geometry!

Thibaud Ehret , Roger Marí , Gabriele Facciolo

分类：计算机视觉

2022-06-29

神经辐射场（或NERF）代表了新的视图合成领域的突破和从多视图图像集合中对复杂场景进行的3D建模。最近的许多作品一直集中在通过正则化来使模型更加健壮，以便能够使用可能不一致和/或非常稀疏的数据进行训练。在这项工作中，我们刮擦了差异几何形状如何为稳健训练NERF样模型提供正则化工具的表面，这些工具经过修改，以表示连续和无限可区分的函数。特别是，我们展示了这些工具如何产生先前提出的NERF变体的直接数学形式主义，旨在改善具有挑战性的条件（即regnerf）。基于这一点，我们展示了如何使用相同的形式主义来培养表面的规律性（通过高斯和平均曲率），使得例如从非常有限的观点中学习表面。

translated by 谷歌翻译

Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection

Samuele Cornell , Thomas Balestri , Thibaud Sénéchal

分类：机器学习

2021-11-20

在许多启用语音的人机交互情景中，用户语音可以与设备播放音频重叠。在这些实例中，诸如关键字斑点（KW）和设备定向语音检测（DDD）的任务的性能可能显着降低。为了解决这个问题，我们提出了一种隐含的声学回声消除（IAEC）框架，其中训练神经网络以利用参考麦克风信道的附加信息来学习忽略干扰信号并提高检测性能。我们分别研究了这个框架，分别为kWs和ddd的任务，一个增强版的谷歌语音命令v2和一个真实世界的alexa设备数据集。值得注意的是，在设备播放条件期间，我们显示DDD任务的假拒绝率为566 \％。我们还表现出与KWS任务的强大端到端神经回声消除+ KW基准的性能相当或卓越的性能，其数量级计算要求较少。

translated by 谷歌翻译